Long-read sequence assembly of the gorilla genome.

نویسندگان

  • David Gordon
  • John Huddleston
  • Mark J P Chaisson
  • Christopher M Hill
  • Zev N Kronenberg
  • Katherine M Munson
  • Maika Malig
  • Archana Raja
  • Ian Fiddes
  • LaDeana W Hillier
  • Christopher Dunn
  • Carl Baker
  • Joel Armstrong
  • Mark Diekhans
  • Benedict Paten
  • Jay Shendure
  • Richard K Wilson
  • David Haussler
  • Chen-Shan Chin
  • Evan E Eichler
چکیده

Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

A Genome-Wide Survey of Genetic Variation in Gorillas Using Reduced Representation Sequencing

All non-human great apes are endangered in the wild, and it is therefore important to gain an understanding of their demography and genetic diversity. Whole genome assembly projects have provided an invaluable foundation for understanding genetics in all four genera, but to date genetic studies of multiple individuals within great ape species have largely been confined to mitochondrial DNA and ...

متن کامل

Accurate Long-Read Alignment using Similarity Based Multiple Pattern Alignment and Prefix Tree Indexing

The ongoing research in sequencing technology has yielded in machines that are able to produce sequence data in the order of one billion base-pairs (bp) per machine day with an average read length of less than 100 bp per read (“short-reads”). In the past two years, many efficient algorithms have been developed for short-read alignment against a reference genome and for genome assembly, for an o...

متن کامل

Haplotype and Repeat Separation in Long Reads

Resolving the correct structure and succession of highly similar sequence stretches is one of the main open problems in genome assembly. For non haploid genomes this includes determining the sequences of the different haplotypes. For all but the smallest genomes it also involves separating different repeat instances. In this paper we discuss methods for resolving such problems in third generati...

متن کامل

The use of Oxford Nanopore native barcoding for complete genome assembly

Background The Oxford Nanopore Technologies MinION(TM) is a mobile DNA sequencer that can produce long read sequences with a short turn-around time. Here we report the first demonstration of single contig genome assembly using Oxford Nanopore native barcoding when applied to a multiplexed library of 12 samples and combined with existing Illumina short read data. This paves the way for the closu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Science

دوره 352 6281  شماره 

صفحات  -

تاریخ انتشار 2016